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In this paper, we study the equilibrium behavior of Eigen's quasispecies equations for an arbitrary 
gene network. We consider a genome consisting of A'^ genes, so that each gene sequence a may be 
written as a = (t\(J2 ■ ■ ■ on- We assume a single fitness peak (SFP) model for each gene, so that gene i 
has some "master" sequence Oifi for which it is functioning. The fitness landscape is then determined 



by which genes in the genome are functioning, and which are not. The equilibrium behavior of this 
\^ ' model may be solved in the limit of infinite sequence length. The central result is that, instead of 

(N ■ 

a single error catastrophe, the model exhibits a series of localization to delocalization transitions, 
, which we term an "error cascade." As the mutation rate is increased, the selective advantage 

o 



for maintaining functional copies of certain genes in the network disappears, and the population 

distribution delocalizes over the corresponding sequence spaces. The network goes through a series 

of such transitions, as more and more genes become inactivated, until eventually delocalization 

occurs over the entire genome space, resulting in a final error catastrophe. This model provides a 
y 1 I 

^ . criterion for determining the conditions under which certain genes in a genome will lose functionality 

' due to genetic drift. It also provides insight into the response of gene networks to mutagens. In 

■ particular, it suggests an approach for determining the relative importance of various genes to the 

^ : 

, fitness of an organism, in a more accurate manner than the standard "deletion set" method. The 

O ' results in this paper also have implications for mutational robustness and what CO. Wilke termed 

> 



"survival of the fiattest." 

PACS numbers: 87.23.Kg, 87.16.Ac, 64.90.+b 
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I. INTRODUCTION 

A challenging problem in quantitative biology is to successfully model the evolutionary response of organisms to 
various environmental pressures. Aside from its intrinsic interest, the development of models which can predict the 
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time evolution of a population's genotype could prove useful in understanding a number of important phenomena, 
such as antibiotic drug resistance, cancer, viral replication dynamics, and immune response. 

Perhaps the simplest formalism for modelingjat least phenomenologically, the evolutionary dynamics of replicating 



organisms is known as the quasispecies model 
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This model was introduced by Manfred Eigen in 1971 as a way 
to describe the in vitro evolution of single-stranded RNA genomes 1]. In the simplest formulation of the model, we 
consider a population of asexually replicating genomes, whose only source of variability is induced by point mutations 
during replication. We assume that each genome, denoted by cr, may be written as a = si . . . sl, where each "base" Sj 
is drawn from an alphabet of size S. With each genome is associated a first-order growth rate constant Ha, which we 
assume to be genome-dependent, since different genomes are expected to be differently suited to the given environment. 
The set of all growth rate constants is termed the fitness landscape, which will generally be time-dependent. 

Replication and mutation give rise to mutational flow between the genomes. If we let denote the number of 
organisms with genome cr, then, 

^ = ^Km(CT',cr)'^a' (1) 

a' 

where Kmic' , cr) denotes the first-order mutation rate constant from a' to a. If Pmicr' , cr) denotes the probability that, 
after replication, cr' produces the daughter genome a, then clearly ,cr) ~ i^a-'Prnic' ,cr). To compute pm(cr',CT), 

we assume a per base replication error probability for genome cr (different genomes may have different replication 
error probabilities, since some genomes may code for various repair mechanisms which other genomes do not). It is 
then readily shown that ^, 

Pm{<y\ a) = (^)^«(-^-')(l - e,, )''"''" ^"'"'^ (2) 

where DH{(J,a') denotes the Hamming distance between a and cr'. 

In order to model the relative competition between various genomes, it proves convenient to reexpress the dynamics 
in terms of population fractions. Defining n = n^, and — n^/n, we obtain the system of equations, 

='^K,n{<y' ,Cr)Xcr' - K{t)Xa (3) 
a' 

where nit) = n^Xa, and is therefore simply the mean fitness of the population. 

The above system of equations is physically realizable in a chemostat, which continuously siphons off organisms to 
maintain a constant population size U] . This ensures that growth is not resource limited, so the assumption of simple 
exponential growth is a good one. It should be pointed out, however, that it is possible to introduce a death term 



which places a cap on the population size, without changing the form of the quasispecies equations. If we introduce 
a second-order crowding term (logistic growth), so that. 



dt 



(4) 



then if is genome-independent, it is readily shown that when converting to the x^j the quasispecies equations are 
unchanged. 

The quasispecies equations may be written in vector form as. 



dx 

— = Ax — (K ■ x)x 
dt ^ ' 



(5) 



a tnat o 

m 



where x = (x^) is the vector of population fractions, A — ((A^ct' ~ ^^((t',^))) is the matrix of first-order mutation 
rate constants, and /? = (ko-) is the vector of first-order growth rate constants. For a static fitness landscape, Eigen 
proved that x evolves to the equilibrium distribution given by the eigenvector corresponding to the largest eigenvalue 
of A 

A considerable amount of research on quasispecies theory has focused on the simplest possible fitness landscape, 
known as the single fitness peak (SFP) landscape 0, ^ 7, a ^ ^ ^ ^ • In the SEP model, there exists a single, 
"master" sequence do for which k^^ = fc > 1, while for all other sequences we have k^. = 1. The SEP model assumes 
a genome-independent mutation rate, so that = e for all a. 

The SEP landscape is analytically solvable in the limit of infinite sequence length. The equilibrium behavior of 
the model exhibits two distinct regimes: A localized regime, where the genome population clusters about the master 
sequence (giving rise to the term "quasispecies" ) , and a delocalized regime, where the genome population is distributed 
essentially uniformly over the entire sequence space. The transition between the two regimes is known as the error 
catastrophe, and can be shown to occur when Prep, the probability of correctly replicating a genome, drops below 
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l/k [5j- The error catastrophe is generally regarded as the central result of quasispecies theory, and it has been 



experimentally verified in both viruses [13 
basis for a number of anti-viral therapies 



and bacteria |14j . Indeed, the error catastrophe has been shown to be the 



The structure of the quasispecies equations naturally lends itself to application to more complex systems than 
RNA molecules. Indeed, the model has been used to successfully model certain aspects of the immune response to 

n 

viral infection 15]. However, in their original form, the quasispecies equations fail to capture a number of important 
aspects of the evolutionary dynamics of real organisms. Eor example, it is implicitly assumed that each genome 
replicates conservatively, meaning that the original genome is preserved by the replication process. Correct modeling 



of DNA-based life must take into account the fact that DNA rephcation is semiconservative Furthermore, the 
assumption of a genome-independent rephcation error probabihty is also too simple, since cells often have various 
repair mechanisms which may become inactivated due to mutations In addition, Eigen's model neglects the 

effects of recombination, transposition, insertions, deletions, and gene duplication, to name a few additional sources 
of variability. Thus, a considerable amount of work remains to be done before a quantitative theory of evolutionary 
response is developed. 

Nevertheless, some progress has been made. For example, semiconservative replication was recently incorporated 



into the quasispecies model |l7i | . A simple model incorporating genetic repair was developed in 
been studied in and finite size effects in 2^ 2l| . 
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One area in which more realistic models need to be developed is in the nature of the fitness landscape. As mentioned 
previously, the most common landscape studied thus far has been the single fitness peak. However, geno mes g enerally 
contain numerous genes (even the simplest of bacteria, the Mycoplasmas, have several hundred genes j2j|), which 
work in concert to confer viability to the organism. Therefore, in this paper, we consider the behavior of the model for 
an arbitrary gene network. We assume conservative replication and a genome-independent error rate for simplicity, 
though we hypothesize at the end of the paper how our results change for the case of semiconservative replication. 

This paper is organized as follows: In the following section, we introduce our generalized iV-gene model defining the 
"gene network." We first give the quasispecies equations in terms of the population fractions of each of the various 
genomes. We proceed to the infinite sequence length equations, and then obtain a reduced system of equations which 
dictates the equilibrium solution of our model. We solve the model in Section III. For the sake of completeness, we 
include a simple example to illustrate how our solution method may be applied to specific systems. We go on in 
Section IV to discuss the results and implications of our model, such as the relation to CO. Wilke's "survival of the 



flattest" 
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and also what our model says about the response of gene networks to mutagens. Finally, we 



conclude in Section V with a summary of our results and future research plans. 



II. THE iV-GENE MODEL 
A. Basic Equations 



Consider a population of conservatively replicating, asexual organisms, whose genomes consist of N genes. Each 
genome a may then be written as cr = fxi . . . ctat. Let us assume, for simplicity, a "single fitness peak" landscape for 



each gene. That is, for each gene i there is a "master" sequence (Ti^o for which the gene is functional, while for all 
f i 7^ fj.o the gene is nonfunctional. We assume that the fitness associated with a given genome a is dictated by which 
genes in the genome are functional, and which arc not. We let denote the fitness of organisms with genome 

a such that at = <7ifi for i G {1, . . . , N}/{ii, while ai ^ <Tifi for i G {ii, . . . , z„} (we adopt the convention that 

{ii, . . . , in} = {} = when n = 0). 

The choice of the landscape . . . , in} {1, . . . , N}, n = 0, 1, . . . , N} is arbitrary, so that the activity 

of the various genes in the genome are generally correlated. Thus, the N genes may be regarded as defining a gene 
network. We assume that the fitnesses are all strictly positive. Without loss of generality (i.e., by an appropriate 
rescaling of the time), we may assume that K{i,...^n} = 1- 

The simplest quasispecies equations for this iV-gene model are obtained by assuming a genome-independent per 
base replication error probability e. We assume that gene i has a sequence length Li, and we define L = Li + . . . + Ljy. 
Then p™(a-', a) = p,„(cri, fii) • . . . • Pm(o-^, ctn), where, 

Putting everything together, we obtain the system of equations, 

a[ a'^ 1=1 

^^^^ 

S(i)xCTi ...(Tjv C^) 

Define the Hamming class Cnih, • ■ • , In) = {ct = cti . . . aNlDnicri, Cj.o) = 'i, « = 1, . . . , N}. Also, define ^i^, = 
5I/<TeCff (ii In) symmetry of the landscape, we may assume that Xa- depends only on the h corresponding 

to cr, and hence we may look at the total population fraction in Cnih, ■ ■ ■ , In) and study its dynamics. The conversion 
of the quasispecies equations in terms of Xa- to z/i,...,;jv is accomplished by a generalization of the method given in 
The result is. 



dt 



Li—li li 



Ln — ^n N 



E E 

il,l=0 il.2=0 

e'-(l-e) 



k 



E E n 

In, 1=0 ijv,2=0 4=l 

S — 1 S 1 



Li — li ~ lis + h,2 \ (h.l ~^ h ~ I 



^h,i+h~h,2:---,^N,i+l'N—h 



-K{t)zi^. 



(8) 



We now let the Li ^ oo in such a way that the a.; = Li/L and fi = Le remain fixed. We assume that the a.; are 
all strictly positive (allowing an to be leads to certain difficulties which we choose not to address in this paper). 
Because the probability of correctly replicating a genome is simply (1 — e)^ — > e^'', fixing /i is equivalent to fixing the 
genome replication fidelity in the limit of infinite sequence length. 

In this limit, it is possible to show that, for each gene i, the only terms in Eq. (8) which survive the limiting 

1 



process are the In = terms 



This is equivalent to the statement that, in the limit of infinite sequence length. 



backmutations may be neglected. We also obtain that, 



and 



7 r T^("»m) ' (9) 



^g-".M (10) 



The final result is. 



J h In 



dt ^ ^ I'A- ...-l' 

l'=0 i'=o ^ ^ 



X 



N- 



(ai^)''i • . . . • (aAr/i)''"z,,_;'^,...,i„-i;^ 

-m^iu-jN (11) 

It should be noted that the neglect of backmutations is only valid when one can group population fractions into 
Hamming classes. In our case, by the symmetry of the fitness landscape, the equilibrium solution only depends on the 
Hamming class, and hence, to find the equilibria, it is perfectly valid to "pre-symmetrize" the population distribution 
and study the resulting dynamics. 

Thus, when studying dynamics, it is generally not valid to neglect backmutations. For example, consider a single 
fitness peak landscape, and suppose that a population of organisms is at its equilibrium, clustered about the fitness 
peak. If the organisms are then mutated, so that they are shifted away from the fitness peak, then eventually they 
will backmutate and reequilibrate on the fitness peak (this situation has been observed with prokaryotes 2^). If we 
imagine that the mutation shifts the organism from the master genome cto to some other genome a' ao, then it is 
clear that the landscape is not symmetric about cr', and furthermore that the population distribution is not symmetric 
about (To- Thus, Eq. (11) does not apply. To correctly model the reequilibration dynamics, it is necessary to consider 
the finite sequence length equations, and explicitly incorporate backmutations. 
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B. Reduced Equations 

Because of the neglect of backmutations, Eq. (11) may in principle be solved recursively to obtain the equilibrium 
distribution of the zi^^___^iff at any ji, assuming we know the equilibrium mean fitness, denoted K{t = oo). The 

problem, of course, is that R,{t — oo) needs to be computed. This may be done as follows: Given any collection 
{ii, . . . , in} C {1, . . . , N} of indices, define via, 

oc oc 

where ei = (1, 0, . . . , 0), 62 = (0, 1,0,..., 0), and so forth. Thus, is simply the total fraction of the population 

in which the genes of indices {ii, . . . , in} are faulty, while the remaining genes are given by their corresponding master 
sequences. 

The dynamics of the is derived in Appendix A. The result is given by, 



n-l 



^==0 {ii,...,jfe}c{ii,.--,«r.} »e{ii,.--,»r.}/Oi,---,ifc} 

(13) 

We can provide an intuitive explanation for this expression: Because backmutations may be neglected in the limit 
of infinite sequence length, it follows that, once a gene is disabled, it remains disabled. Therefore, given a set of 
indices {ii, . . . , in}, mutational flow can only occur from to Z{ji,...,j^} for which {ii, . . . , in} C {ji, . . . ,jm} 

(in this paper, if Oi c fl2, then fli is a proper subset of 0,2- If f^i C fl2, then either fli is a proper sub- 
set of fl2 or Cli = d2). Similarly, z^i-^ i^y can only receive mutational contributions from ... j^j for which 
{ill • • • )im} C {ii, . . . , in}. For such a {ji, . . . ,jm}, the probability of mutation to {ii, . . . , in} may be com- 
puted as follows: Because the genes corresponding to the indices ji, . . . , remain faulty, the neglect of back- 
mutations means that it does not matter whether these genes are replicated correctly or not. All genes with 
indices in {I, . . . , N}/{ii, . . . ,in} must remain equal to the corresponding master sequences after mutation. The 
probability that gene i replicates correctly is given by e""*'^, so the probability that all genes with indices in 
{1, . . . ,N}/{ii, . . . ,in} replicate correctly is nie{i,...,Ar}/{ji,...,i„} e""*** = e~^^~"*i~ ' ~"*"^'*. The genes which must 
be replicated incorrectly are those with indices in {ii, . . . , in}/{ji, ■ ■ ■ ,jm}- Since each such gene replicates incorrectly 
with probability 1 — e~"'^, it follows that the probability of replicating all genes in {ix, . . . ,«n}/{jii • • • ,3m} incor- 
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rectly is nie{ji,...,i„}/{ji,...,jv„} (1 ~ e~"'^^). Putting everything together, we obtain a mutational flow from ^{ji,...,jv„} 
to i{ii,...,i„} of e-(^-"'i-----"'")''K{ji,...j^}%i,...j^}nie{ii,...,i„}/{ji,...,j„}(l^ Summing over all possible 

{ji, . . . , j„i} C {ii, . . . , in} gives us the expression in Eq. (13). 

Note that K{t) = J2n=o i„} ••.»n}^{«iv.»n}' ^° '^^ need to solve Eq. (13) in order to obtain the equilibrium 

distribution of the model. 



III. SOLUTION OF THE MODEL 

In this section, we proceed to solve the reduced system of equations given by Eq. (13). Since this provides us with 
K{t = oo) and zo,...fl = it follows that we can recursively solve for the equilibrium values of all ^^d, 
In vector notation, Eq. (13) may be expressed in the form, 

rll ^ ^ -, 

- = m-{n-~z)z (14) 

where z is the vector of all k is the vector of all and B is the matrix of mutation rate constants. 

Because of the neglect of backmutations in the limit of infinite sequence length, different regions of the genome space 
become mutationally decoupled, so that the largest eigenvalue of the mutation matrix B will in general be degenerate. 
Thus, the equilibrium of the reduced system of equations is not unique. However, for any initial condition, the system 
will evolve to an equilibrium, though of course different initial conditions will yield different equilibrium results. 



A. Definitions 

In this subsection, we define a variety of constructs which we will need to characterize the equilibrium behavior of 
our model. We begin with the definition of a node: We define a level n node to refer to any collection of "knocked out" 
genes with indices {ii, . . . , i„} C {1, . . . , N}. The reason for this terminology is simple. We may imagine the set of 
all nodes to be connected via mutations. Because of the neglect of backmutations, it follows that a node {ii, . . . , in} 
is accessible from a node {ji, ■ . ■ ,jm} via mutations if and only if {ji, . . . , jVn} C {ii, . . . , z„}. The result is that we 
can generate a directed graph of mutational flows between nodes, an example of which is illustrated in Figure 1. 

Given some node u = {ii, . . . , define = {/> C {1, . . . , N}\u C i>}. Therefore, Gi, may be regarded as the 
subgraph of all nodes which are mutationally accessible from u. An example of such a subgraph is illustrated in Figure 
2. 
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Level 1 



FIG. 1: The directed graph of mutational flow between nodes for a three-gene network. 

Let fl denote any collection of nodes. Then we may define Gq = {J^^qG^- Furthermore, define Q = {u G 
Ci\n n Gi, = v}. Thus, Ci is the set of all nodes in O such that no node in f2 is contained within the mutational 
subgraph of any other node in fl. Figure 3 gives an example showing the construction of from f2. 

Given some node {ii, . . . ,in}, define Keff{{ii, . ■ ■ ,i„};/ti) = K;{ij_...^j„}e~(^~"*i~'~"*"^'*. We then define KmaxilJ') = 
max{Keff{u; C {1, . . . , N}}. Finally, given some ji, define O-maxilJ) = {z^ C {1, . . . , N}\Keff{i'; n) = Kmax(M)}- 

With these definitions in hand, we are now ready to obtain the structure of the equilibrium solution at a given ji. 

B. Equilibrium Solution 

1. Determination of K{t = oo) 



We claim that K{t = oo) = KmaxilJ-)- We prove this in two steps. First of all, we claim that K{t = oo) = Keffiy; n) 
for some node u. Clearly, because X^^c{i,...,jv} = 1, it follows that at least one of the > at equilibrium. Let 
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FIG. 2: The mutational subgraph G{i,3} for a four-gene network. 
u' = {ii, . . . , in} be a node of minimal n such that > 0. Then it should be clear that, at equilibrium, we have, 

= — ^ = {Keff{v'\ H) - K{t = 00))Z^' (15) 

at I t—oo 

which, since i^/ > 0, may be solved to give K{t = oo) = Kef/iv'; /u). 

So now suppose that R{t = oo) ^ Kmaxil^)- Then K{t = oo) < KmaxifJ-)- Such an equilibrium can never be observed 
because it is unstable. To see this, let Vmax denote a node for which Keff^Vmaxif^) = i^max 

ill). Then from Eq. (13) 

we have, at equilibrium, that, 

dz I 

= — > {Keffil^max; /«) - «(* = Oo))z^^^^ (16) 

at I t=oo 

and so z^,^^^ = 0. Clearly, however, any perturbation on z^,^^^ will push z^,^^^ away from its equilibrium value. This 
equilibrium is therefore unstable, and hence, unobservable. 

Note that since R{t = oo) = Kmaxilj), it follows that the mean equilibrium fitness is a continuous function of yu. 
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{1,2,3} {1,2,4} {1,3,4} {2,3,4} 



13,4} 



■{1,2,3,4} 



FIG. 3: Illustration of f2 and in a four-gene network. The nodes circled with rectangles and circles constitute fi. The nodes 
circled only with rectangles constitute J7. 

2. Determining the ^{ii,...,i„} 



To find the equilibrium solution of the reduced system of equations, we first need to determine which = at 
equilibrium. To this end, we begin with the claim that, for /j, > 0, z^, = milcss i> G Gq f^^y For suppose there 
exists u ^ ^?ima.x(iJ.) ^^^^ t^^t Zi, ^ at equilibrium. Then out of the set of all nodes which satisfy the above two 
properties, we may choose u to be of minimal level. We claim that, for any u C u, we have that i/ ^ ^hmaxW^ ^^'^ 
otherwise it is clear that v e Gt, C Gq (^) ^<=. Therefore, by the minimality of the level of it follows that 
zp = whenever j> is a proper subset of u. But then the equilibrium equation for Zi, gives K{t = oo) = Keffiu; fx), 
and so Keff{y;ij) = Kmax(M)- Therefore, v € i^maxil^)- However, by assumption, u ^ ^maxifJ-), which means that 
Gi, contains nodes in flmaxil^) which are distinct from u. Denote one of these nodes hy i> = {ji, . . . ,jm\- Then at 
equilibrium we have, from Eq. (13), that, 

dzo I 

at I t=oo 



-(l-«Jl-----"jm)M 



EE 
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> (17) 

which is clearly a contradiction. This establishes our claim. 

We now argue that our equilibrium solution may be found if we know Zv for y G ^maxifj)- We claim that for any 

Zu= ^ f3pu{tJ')zp (18) 

where the /Jjjy > 0, and for > a given is strictly positive if and only if u G Gp. The above expression then 
holds for all u, since we simply take (3^,^ = for u ^ (p) ■ 

We can prove the above formula via induction on the level of the nodes in G^^^^^^y In doing so, we will essentially 
develop an algorithm for constructing the fipv So, let us start with rimin, the minimal level nodes G^^^^^^y Then 
clearly v e ^^max{^^), so that /Spi, = Spv, hence the formula is correct for rimin- So now suppose that, for some 
n > riming the formula is correct for all m such that rimin <m<n. Then for a level n + 1 node in Gq ^^^y denoted 
by {ii, . . . we have, at equilibrium, that, 

= («e//({«l,---,Wl};/^) -«maa;(M))%i,...,i„+i} 

n 

fc=o {ji,.-.,i)i}c{n,...,i„+i} 
Hh,-,Mhju-,M n (1-e-"'^) 

ie{ii,...,i„+i}/{ji,...,jfe} 
= (Ke//({«1, • • • , Wl}; M) - '«max(M))^{ii,...,i„+i} 



-(l-aii-----ain+i)'* 



i'C{ii,...,i„+i},!^eGs~^^^^(^) !5enmax(/i) 
ie{n ,...,i„+i}/i/ 

= ('«e//({«l, • • • , Wl}; M) - Kmaa:(M))^{ii,...,i„+i} 



+e" 



-(l-"ii-----"i„+i)M 



n (l-e-"-") (19) 



ie{n,...,i„+i}/i' 
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Now, if {ii, . . .,in+i} e ClmaxilJ'), then = 5p{ii,...,i„+i}. Otherwise, Keff{{ii, . . . ,i„+i;/i) < Kmax{lJ), so 

the equiUbrium equation may be solved to give, 

g-(l-aii-----ai„+i);* 
/3i.{ji,...,i„+i} Kmaxilj)- l^eff{{h-,----,in+l}\lj) ^ 

n (l-e-"*'') 

j6{ii,...,i„+i}/i/ 

(20) 

Note that > 0. Furthermore, if {ii, . . . ^ Go, then no proper subset of {ii, . . . ,in+\} is in Gjj. 

Therefore, {v C . . . € G^} = 0, so = 0- Conversely, if . . . ,in+i} G Gp, then since 

{ii, . . . , i„+i} ^ V, it follows that {z^ C {h, . . . , z„+i}|z^ G Gp} 0. Therefore, the sum in Eq. (20) is nonempty, 
hence, since the fiov appearing in the sum are all strictly positive, it follows that /3!/{ii,...,i„_,_i} > 0. This implies that 
/3i>{ii,...,i„+i} is strictly positive if and only if {ii, . . . G Gv, which completes the induction step, and proves the 

claim. 

For each i/ G ^max{^J)^ we can define i^c, = "YlfveG- 1^'^'^' ^^'^ then define 7i>i^ = Pvi^/np and wp = t^vvZv- If, for each 
i> we also define 'yo = {ivv), that is, the vector of all 7^^, and if we define z = (i^), then we obtain, 

z = ^ wv^p (21) 

where E;>gfw(M) = 1- 

Note that the 7,5 form a linearly independent set of vectors. Therefore, if flmaxifJ-) contains more than one node, then 
the equilibrium solution of the reduced system of equations is not unique, but rather is defined by the parallelipiped 

{Epen^axM «^57-l Epea_(^) wp = l,wp> 0}. 

As mentioned earlier, the degeneracy in the equilibrium behavior follows from the neglect of backmutations in the 
limit of infinite sequence length. The various nodes in (imaxil^) become mutationally decoupled in this limit, which 
can cause the largest eigenvalue of the mutation matrix B to be degenerate. However, for finite sequence lengths, the 
quasispecies dynamics will always converge to a unique solution. In particular, if we start with the initial condition 
Z0 = 1, then for finite sequence lengths we will converge to the unique equilibrium solution. Because all nodes are 
mutationally connected in the infinite sequence length limit with this initial condition, we make the assumption that 
the way to find the infinite sequence length equilibrium which is the limit of the finite sequence length equilibria is 
to find the infinite sequence length equilibrium starting from the initial condition Z0 = 1. This allows us to break the 
eigenstate degeneracy in a canonical manner. 
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In the appendices, we describe a fixed-point iteration approach for finding the equilibrium solution of the model. 
Within this algorithm, we also use the initial condition 00 = 1 as the analogous approach to the one above for finding 
the infinite sequence length equilibrium which is the limit of the finite sequence length equilibria. 

Finally, the treatment thus far has been finding the equilibrium solution of the reduced system of equations for 
/U > 0. The equilibrium solution for = is obtained by taking the limit of the fj, > solutions, so that z{iJ, = 0) = 
lim^^o+ 

3. Construction of the phase diagram 

From the previous development it is clear that the nodes in ^max{^^) may be regarded as "source" nodes which 
dictate the solution. To understand how the solution changes with /x, we therefore need to determine how O^axCM) 
depends on n. 

We claim the following: That there exist a finite number of /i, which wc denote by /zi, . . . , /xjv, where < /Ui < . . . < 
fiN < 00, for which {{K[i^^,„^i^y,ai^ + . . . + Q;i„)|{ii, . . . € ilmax{f^)} contains distinct elements. In any interval 
(/Xj-i, /Uj), Clmax{^i) is Constant, and may therefore be denoted by fij. The fli are all disjoint, and OjUfij+i C Clmaxil^i)- 

We begin proving this claim by introducing one more definition. Let denote the set of all sets of nodes, such 
that a collection of nodes f2 is a member of T,ji if and only if {{n{ii,...,i„},cxii + . . . + Q;i„)|{*i, • • • , in} & ^} contains 
distinct elements. 

Note that since there are 2^ nodes, there are 2^" sets of nodes, hence consists of a finite number of elements. 
Given some fl^ G S^, wc claim that ^lrnax(p) = for at most one /i. To show this, suppose that there exist 
Hi < IJ.2 for which f2max(Mi) = ^max(A*2) = ^i^- Choose any two nodes {ii,...,i„}, {ji,...,im} in and 
note that K;{ij_...^j^}.e~^^~"'i~'"~"'''^''i = K{j^^...j^}e~(^~"^i~'"~"^"'^'*^ = Kmax{^J■^^), and similarly for /X2- However, 
Q^g-6icc ^ a2e-''=^ and aie'^^y = a2e-^^y implies that e"''!*?'-^) = e-''^^^'-^), so that hi = 62 and hence ai = 02. 
Therefore, = and + .+ai„ = oiji+- ■ so +. . . . . ,i„} G 17^} 

does not contain distinct elements. Because this contradicts our assumption about O^, it follows that O.^axipi) = 
for at most one /U. 

So, since contains a finite number of elements, it follows that there are a finite number of (jl for which Vtmaxil^) 
satisfies the property that {(^{ii,. ..,»„}, ccij + . . . + ai„)|{ii, . . . , in} G ^max{^J)} contains distinct elements. We can 
denote these by /^i, . . . , fiN, where we assume that < /xi < . . . < hn < 00. 

Note that if a collection of nodes Q. has the property that O 7^ O, then Q. must be a collection in S^. This is easy 
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to see: fl contains some {ii, . . . ,i„} for which there exists a distinct {ji, . . . ,jm} S O where {ji, . . . ,jm) G 
Therefore ai^ + . . . + at^ < aj^ + . . . + aj^ , which proves our contention. 

We now prove that ^maxifJ-) is some constant, which we denote by ri,, over /Xj). Given some /xq G {fj,i-i, iJ.i), 

let = sup{/t e {i-io, l-t'i)\^max{l-t') = ^max(Mo) V /U G (/zo,/i)} (sup stands for "supremum", which is the least upper 
bound of a set of real numbers. If S is a set of real numbers with an upper bound, then A = sup S exists, and satisfies 
the following properties: (1) A is an upper bound for S. (2) If B is any upper bound of S, then A< B. (2) If B < A, 
then there exists at least one element of S which exceeds B.). Clearly, < fii. We claim that /i+ — fJ-i- To show 
this, note first of all that ^maxilJ-) = ^max(Mo) for all G (/Uq, and that for any jl > , there exists fj, G jl) 
such that 0,max{lj) 7^ ^maxil-i'o) ■ For given any fi' G (/io,M+), we have, by definition of sup, that there exists some 
// G ifi', such that O.maxilJ') = ^maxifio) for all n G {no, A)- In particular, this implies that O.maxilJ'') = ^maxif^o)- 
Furthermore, if there exists ft > for which ftmaxil^) = ^maxil^o) for sll /x G then i^maxil-^) = ^maxif^o) for 

all fj. G (/iO)A), contradicting the definition of /U+. 

Now, suppose Slmax{lJ'+) ^ ^ji- Then we can write K{ii,...,i„} = and + . . . + = a+ for all {ii, . . . ,i„} G 
f^maa;(A'+)- Then siucc Kmax{lJ'+) = ^+6"^^""+^''+ , it follows by continuity that K+e~(^~"+^'^ > K,eff{v;y) for 

^ ^max ilJ'+ ) in some neighborhood — 5, /U+ + 5) . But this implies that ^max (m) = ^max ) over —S, ^+ + 5). 
Since ^^^^^(/io) = i^max(M) over - we obtain that ClmaxilJ') = ^maxil^o) over (/Uo,M+ + contradicting 

the definition of 

We have just shown that 0,rnax{l-^+) G S/. Since rimaa;(M) ^ over we must have that /i+ = /Zj. Using 

a similar argument with inf, we can show that ^maxilJ-) = ^max(Mo) over (/ii_i,/io), and so ^max{l^) is constant on 
{fjbi-i, fjbi) (inf stands for "infimum", and is defined as the greatest lower bound of a set of real numbers. It satisfies 
properties analogous to those of sup) . 

Suppose for two z,j with i < j, we have J7j and Q,j are not disjoint. Then they share at least one node, and 
so, by the nature of the two sets, we must have that 0^ = fij. Define k to be for any node in Oj, Oj, 

and a to be ail + ... + Oiin- Now, ^maxifJ-i) contains some node {ii, . . . ^ such that Ke//({ii, . . . /u) < 
^g-(i-a)^* £qj. ^ U (pj-i, fj,j). But if for a;i < 0:2 we have that aie~''^^^ < a2e~''^^^ and aie~''^^^ < 

a2e-^^'^^, then (ai/a2)e-(''i-''2)^i < 1 and (ai/a2)e-(''i-''2)=^2 < 1. Since (ai/a2)e-(''i-''2)=^ is monotone decreasing 
or increasing, it follows that (01/02)6"^''^"''^^^ < 1 on (a;i,X2), or equivalently, aie~^'^^ < 026"''^^. Therefore, 
K'maxil^i) = 'te/zClH) • • • )^n}; Mi) < Ke"^^""^''* =>'^=. The fJj are thus all disjoint, as claimed. 

Finally, since Kmaxifj) is continuous, we have that lim - Kmaxil^) = KmaxifJ'i)- If G fij, then this gives 
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i^maxil^i) = Keff{iy;fii). Similarly, considering lim ^+ Kmaxil^) gives that Kmaxil^i) = i^eff{i^]IJ-i) for y G ^i+i- 
Therefore, C Q.max{lJ-i), so fij UOj+i C Q.max{lJ-i), as claimed. 

The various Vti may therefore be regarded as defining dififerent "phases" in the equilibrium behavior of the model. 
Physically, each "phase" is defined by a set of "source nodes," which dictate which genes in the genome are knocked 
out, and which are not. The transition from fij to corresponds to certain genes in the genome becoming knocked 
out, and perhaps other genes becoming viable again. This transition can happen more than once, and so we refer to 
the series of fij — > ilj+i transitions as an "error cascade." 

Because /te//({l, • . • , N}; fi) = 1, for sufiiciently large /x, Ke//({1, • . • , N}; fi) > Keff{y; fj) for any y ^ {1, . . . , iV}. 
Therefore, for sufiiciently large /i, ^maxifj) = {{1, • • • , -^}}- Since ^maxifj) is constant on (/ijv,oo), it follows that 
^maxitj) = {{1, • • • , N}} on (/xjv, oo). Thus, the final transition from fijv to O/^+i corresponds to delocalization over 
the entire genome space, which is simply the error catastrophe. 

4- Finding the zi^^,..^ij^ 

Once we have determined R{t = oo), we can in principle obtain the population fractions zi^^,,,^ij^ in the various 
Hamming classes. The problem is that, if Z0 = 0, then for any finite values of /i, . . . , we get that = 0. To 

show this, suppose we can find h, . . . ,In such that > at equilibrium. Of the /i, . . . , Zat for which zi^^,,,^i^ > 0, 

choose a set of indices such that l[ + . . . + 1']^ is as small as possible. Note that if zi^^,,,^i^ = zi'^-i'^^,,,^i'^-i'i;^, 

with (/'/, . . . , Z^) 7^ (0, . . . , 0), then zi,,...,i^ = 0. 

Now, let the nonzero l'^ be denoted hy l'^^,. .. ,1^^. Then /v;'^^...^;^ = K^jj ... ^^j, and we have, from Eq. (11), that, at 
equilibrium, 

° = ^^^^ I ^^^= («{i„...,i„}e-'' - R{t = oo))z,i,...,,^ (22) 

which gives K{t = oo) = K{ii,....i„}e~''. But, K{t = oo) > K{ji,...,i„}e~'^^~"*i~---~"*"'''. Therefore, > 
g-(i-aii-...-ai„)/i^ g^j^j so ail + ... + ctin = 0' hencc n = 0. But then 2i'i,...,;'^ = Z0 > =X=. This proves our 
claim. 

If f2max(A*) = 0, then the above claim does not present us with any problem. We can simply recursively solve 
Eq. (11) at equilibrium for all the zii....jijy. But once any delocalization occurs, it is impossible to solve for the 
equilibrium distribution in terms of the Hamming classes. However, we can recursively obtain the distribution of 
another class of population fractions, as follows: Given some collection of indices {ii, . . . ,in}, another collection of 
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indices {ji, . . . ,jk} C {ii, . . . ,i„}, and a collection of Hamming distances h,... ,In, we define ^{ji,...,jfc}(^{ii,...,i„}) 
and 2;{ji,...,jfe}(/{ii,...,i„}) as, 

oo oo 
oo oo 

It is possible to show that, 

k 

Hji,-,jk}ikiu-,ir.}) = X! X] 

and hence that, 

k 

hh,-,jk}ikil,-,in}) = X]^"^)*""' X ^{j(,...J,'}(^{n,-,in}) (25) 

We may then derive the expression for Since the derivation uses techniques similar to those 

used in Appendices A and B, we simply state the final result, which is, 

dZ{n,...,i„}{l{n,...,^„}) ^ ^_(i_«.^_..._a,„V^ X X n X 

fe=o{ji,...jfc}c{ii,...,i„} i;=o ie{i,...,JV}/{ii,...,i„} 

i£{l,...,Ar}/{n,...,i„} 

nje{n,...,i„}/Oi,...,i.}(l-e""^'')x 

-K(i)i{H,...,i,}(r{i„...,i^}) (26) 

where K{ji,...,jfc}('{ii,...,i„}) = '^{ii,.--,ifc}u{ji,...,i,'}) where . . . , j;} are the indices of the nonzero Hamming distances 
in l{ii,...,ir,}- 

We claim that, at equilibrium, 2^(1^) > only if u G Ga for some i> G i^maxil^) for which > 0. For if Zv{lv) > 0, 
let u = . . . ,i„} C 1/ be a node of minimal level for which there exists It, such that z^ih) > 0. Note then that 
for any proper subset {ji, . . . ,jk} C {ii, . . . ,in}, we must have that •S{ji,...,j^}(^{ii,...,j„}) = 0. This imphes that, at 
equilibrium, 

dZ{h,...,in}{hiu-,in}) 







dt 

^-{l-ai^-...-ai„)n ^ 



E n 



i>.\ 

i'=o iG{l,...,N}/{ii,...,in} 

ie{i,...,jv}/{ii,...,i„} 



X 



18 

l^{ii,...,i„}{l{ii,...,i„} - ^ii,...,i„}) X 

%ii,...,i„}(^{ii,...,i„} - 

-K{t = oo)%,,...,i„}(r{i,,...,i„}) (27) 

Among all for which Z{i^,...,i^}{l{i^,...,i^}) > 0, there exists an j^j. such that J2ie{i,...,N}/{iu-,i„}^'i 

minimal. Then we obtain, 

Q ^ <^%,...,»n}(^"{'n,...,i„}) I 

dt I t — OO 

= ('^{n,...,i.}a7n,...,i„})e-('-"---"^")''-«(i)) X 

%i,...,in}(4'n,...,^„}) (28) 

which gives K{t = oo) = K{ii,...,i„}(^{jj .^-^)e~(^~"*i Now, let denote the indices of the 

nonzero Hamming distances in ...,,„}. Then = K{ii,...,i„}u{ii,...,i;„}- But since R{t = oo) > 

'«{ii,...,i„}u{ii,....i;„}e "''^ "'m-*'^^ -^ve get a^'^ + ... + cti;„ = 0, so to = 0. Therefore R{t = oo) = 

Keff{i>; n), so since zo > 0, we have i/ G fiTOax(M)- 

The S^(Z^) may be obtained recursively from Eq. (27), starting with the values of z„ for v e J^max(M)- The idea is 
that, starting with the values of z^, for u G O.maxilJ'), we may compute z„{l„) recursively. We then proceed down the 
levels, computing the Zi,{li,) using the values of 5y(/i/ — l^) and Zi}{lp) for i> C i'. Note then that instead of computing 
the zi^^,,,^i^, which will be as soon as any delocalization occurs, we first sum over a set of gene indices containing 
the delocalized genes as a subset, and only compute the population distribution for finite Hamming distances of the 
localized genes. 



C. Localization Lengths 



In this subsection, we compute various localization lengths associated with the population distribution. Specifically, 
given a node {ii, . . . ,in}, and some i ^ {ii, . . . ,in\, we define two localization lengths, ..,,„} ^"^^ (^i){ii,...,i„}' 

as follows: 

oo oo oo 

{k){ii,...,i„}= ^ ••••• ^ kzu^ei,+...+h„ei^+hei 

(29) 

oo oo oo 

(^»){ii,...,i„} = XI X] kzii^ei,+...+U„,ei„,+Uei 

(30) 
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Note that, 



(«i){H....,.„} = E E (^^{ju...M (31) 

k=0 {ji,...,jfc}C{ji,...,i„} 



and so, in analogy with zi^i-^^,,,^^^^ and ^{ii, ...,«„}, we have that. 



(«'"^){n.....„}=E(-l)""' E ^^in,.....} (32) 

fc=0 {jiv.jfc}C{ii,...,i„} 

We also define the localization length (Zj) by, 

oo oo 

(Z,) = ^....- ^ Uu,...,;. (33) 

Note that {k) = {li){i,...,N}/{i} = E^=o^ E{ii,...,i„}c{i,...,jv}/{i} and so is finite if and only if all the 

are finite. 

We can compute (Zi)^^^ ^j^j at equilibrium by finding the time derivative and setting it to 0. In Appendix B we 
show that. 



( ')fa,...,»n} ^ (Ke//({n,---,«n,«};/i)-K(0)(^i){n,...,i„} 

n-l 

+e~^^~"*i E E ^ 

fc=0 {ji,---,jfe}C{n,...,in} 
('«{ji,..-Jfc,i}(^»){ji,...,Jfc} + °'i^^'^{ju■■■dk}^{jl,■■■Jk} + "iM«{ji,...,jfc,i}%i,...,jfe,i}) 

n {l-e-^^n (34) 

j6{ii,...,i„}/{ji,...,jfe} 

Therefore, at equilibrium, we get, 

g-(l-aij -ai)Ai 

(^i){n,...,i„} = ^ oo) - Ke//({U, . . . , i}; m) (''^^i--'">^{"'--'"} + '^{iu-.in,i}Hn,-,in,i}) 

g-(l-a,,-...-a,„-aOA' 
Kit = 00) - ACe//({*l, . . . , i}; m) 

n {l-e-^n (35) 

je{ii,...,i„}/{ji,...,jfe} 

We can characterize the behavior of the {h){i^^,,,^i^y- First of all, we claim that {li){i^^,„^i^y = if and only if 
= 0. Secondly, we claim that = oo if and only if {ji, . . .,jk,i} € Clmaxil^) with > 

for some {ji, . . .,jk} C {h, . . . ,i„}. 
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To show this, note first of all that, from physical considerations, {h)[i^^,„^i^-^ = if = 0. If > 0, 

then {ii, . . . ,in,i} S G'a„^^(^)) and so, since K{t = oo) > Keff{{ii,...,in}',f^), it follows that {h)[i^^,„^i^-^ > 0. This 
establishes the first part of our claim. 

So now suppose that {ji, ■ ■ ■ ,jk,i} & ^maxilj), with > for some {ji,...,ife} C {ii,...,i„}. Then 

R{t = oo) = Ke//({ii,...,ife,i};M), and so, 

^-(l-aj^-...-aj^,-ai)n 

il^){n,-,M = "^^s(t = oo)-,.e//({ii,...,ifc,i};M) 

'^iji,-,jk,i}Hh,-,jk,i} = 00 (36) 

which of course implies that {li)^^^ .^^ = 00. 

To prove the converse, let us suppose that {li)^^^ = oo. Let us choose {ji, . . . ,jk} C {ii, . . . to be the 
minimal level subset for which {h) = oo. Then if K{t = oo) > Ke//({ii, • • • ,jk,'i}',n), it is clear from the 
expression for ^j^ j^y/dt that {h) [j'^^,„j>j = oo for some . . . , j;'} C {ji, . . . ,jk}, with < / < A; - 1. But 
this contradicts the minimality of k, hence K{t = oo) = Keff{{ji, ■ ■ ■ ,jk, i}', fj), so since > 0, it follows that 

{ii) • • • )ifeiO ^ ^maxifj)- This proves the converse, which establishes the second part of our claim. 



D. A Simple Example 

We now illustrate the theory developed above using a simple two-gene "network" as an example. We assume a 
genome containing two identical genes, so that ai = a2 = 1/2, and we choose the following growth parameters: 
K0 = 10, K[i} = K^2} = 5, and K{i,2} = 1- 

With these parameters, the system exhibits two localization to delocalization transitions. First, for /j, ^ [0, 2 In 2) we 
have i^maxin) = 0- For n G (21n2,21n5) we have ^max{^^) = The error catastrophe occurs at = 21n5. 

We determined the equilibrium behavior of the model by solving the finite sequence length equations for L = 40 
and 5 = 2. The details may be found in Appendix C. Figure 4 shows a plot of R{t = oo) from the simulation results 
and from our theory. Figure 5 shows plots of zg, -S{2}) and -S{i,2} from the simulation results and from theory. 



IV. DISCUSSION 



The first point to note about the solution of the quasispecies equations for a gene network is that, unlike the single 
gene model, which exhibits a single "error catastrophe," the multiple gene model exhibits a series of localization 
to delocalization transitions which we term an "error cascade." The reason for this is that as the mutation rate 
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FIG. 4: Plot of R{t = oo) from both simulation and theory. 

is increased, the selective advantage for maintaining functional copies of certain genes in the genome is no longer 
sufficiently strong to localize the population distribution about the corresponding master sequences, and delocalization 
occurs in the corresponding sequence spaces. 

The more a given gene or set of genes contributes to the fitness of an organism, the larger /i will have to be to 
induce delocalization in the corresponding sequence spaces. Eventually, by making fi sufficiently large, the selective 
advantage for maintaining any functional genes in the genome will disappear, and the result is complete delocalization 
over all sequence spaces, corresponding to the error catastrophe. 

The prediction of an error cascade suggests an approach for determining the selective advantage of maintaining 
certain genes in a genome. Currently, the standard method for determining whether a gene is "essential" is by 
knocking it out, and then seeing if the organism survives. By knocking out each of the genes, one can construct a 

n 

"deletion set" for a given organism, consisting of the minimal set of genes necessary for the organism to survive 27]. 
While knowledge of the deletion set of an organism is important, it does not explain why the organism should 
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FIG. 5: Plots of zji}, 5{2}, and 2{i,2} from both simulation and theory. By symmetry, ^{i} ~ if{2} ~ 1/2 when Clmax{l-i.) = 
{{1},{2}}. 

maintain functional copies of other, "nonessential" genes. One possibility is that these "nonessential" do confer a 
fitness advantage to the organism, however, the time scale on which organisms are observed to grow during knockout 
experiments is simply too short to resolve these fitness differences. 

Thus, an alternative approach to the deletion set method is to have organisms grow at various mutagen concentra- 
tions. By determining which genes get knocked out at the corresponding mutation rates, it is possible to determine 
the relative importance of various genes to the fitness of an organism. Such an experiment is likely to be difficult to 
perform. Nevertheless, if successful, it would provide a considerably more powerful approach than the deletion set 
method for determining fitness advantages of various genes. 

The results in this paper also shed light on a phenomenon which CO. Wilke termed the "survival of the flattest" 
[2^. Briefly, what Wilke (and others) showed was that at low mutation rates, a population will localize in a region of 
sequence space which has high fitness. At higher mutation rates, a population will relocalize in a region of sequence 
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space which may not have maximal fitness, but is mutationally robust 2^ . 

The error cascade is exactly a relocalization from a region of high fitness but low mutational support to a region of 
lower fitness but higher mutational support. The reason for this is that the fitness landscape becomes progressively 
flatter as more and more genes are knocked out, because the more genes are knocked out, the smaller the fraction of 
the genome which is involved in determining the fitness of the organism. 

This implies that an error cascade is necessary for the "survival of the flattest" principle to hold. Robustness in this 
sense is therefore conferred by modularity in the genome. That is, robustness does not arise because an individual 
gene may remain functional after several point-mutations, but rather arises from the fact that the organism can 
remain viable even if entire regions (e.g. "genes" ) of the genome are knocked out (it should be noted that the idea 
that mutational robustness is conferred by modularity in the genome has been discussed before 

To see this point more clearly, one can consider a "robust" landscape in which the genome consists of a single gene. 
However, unlike the single-fitness peak landscape, the organism is viable out to some Hamming class lyia- Therefore, 
if L'h((t, (To) — I, then ~ I ii I > Ivia, otherwise = ki, where kq > ki > . . . > m^^^ > 1. Using techniques similar 
to the ones used in this paper (neglect of backmutations and stability criterion for equilibria) , it is possible to show 
that the equilibrium mean fitness is exactly kqc^'^, unchanged from the single-fitness peak results. Thus, in contrast 



to robustness studies which consider finite sequence lengths and do not have a well-defined viability cutoff |28| , in the 
limit of infinite sequence length there is no selective advantage in having a genome which can sustain a finite number 
of point mutations and remain viable. 



V. CONCLUSIONS 



This paper developed an extension of the quasispecies model for arbitrary gene networks. We considered the case of 
conservative replication and a genome-independent replication error probability. We showed that, instead of a single 
error catastrophe, the model exhibits a series of localization to delocalization transitions, termed an "error cascade." 

While the numerical example we used in this paper was relatively simple (in order to clearly illustrate the theory 
developed), it is possible to have nontrivial delocalization behavior, depending on the choice of the landscape. For 
example, it is possible that certain genes which are knocked out in one phase can become reactivated again in the 
following phase. That is, instead of a delocalization, one can have a re-localization to source nodes not contained in the 
mutational subgraphs of the source nodes in the previous phase. This implies that the can exhibit discontinuous 
behavior. The types of equilibrium behaviors possible is something which will be explored in future research. 
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Future research also will involve incorporating more details to the multiple-gene model introduced in this paper. 
For example, one extension is to move away from the "single-fitness peak" assumption for each gene. Another 
natural extension is to study the equilibrium behavior of the multiple-gene quasispecies equations for the case of 
semiconservative replication. While this is a subject for future work, we hypothesize that many of the semiconservative 
results would be essentially unchanged from the conservative ones. Thus, we claim that at equilibrium, we would 
still have that K(t = oo) = n^axil^), only this time K,f,ff{v;^) is computed by replacing e^^ with 2e^^/^ — 1. We 
also claim that we would still have that ^max{^^) define the "source" nodes of the equilibrium solution. Indeed, we 
hypothesize that, for semiconservative replication, Eq. (13) becomes, 

n-l 

k=0 {ji,...Jfc}C{ii,...,i„} 

Jl (l-e-"'^/2) (37) 

i6{n,...,i„}/{j"i,...jfc} _ I — I 

Finally, another subject for future work is the incorporation of repair into our network model. In 3j ll^ it was 
assumed that only one gene controlled repair. By assuming that several gene s control repair, then, in analogy with 

I 



fitness, we hypothesize that instead of a single "repair catastrophe" 



18| . we obtain a series of localization to 



delocalization transitions over the repair gene sequence spaces, a "repair cascade." 
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APPENDIX A: DERIVATION OF THE REDUCED SYSTEM OF EQUATIONS 

In this appendix, we derive Eq. (13) from Eq. (11). To this end, define, 

oc oc 

^ 5Z 5Z ^i.ie.l+-+'»,>e,„ (Al) 

We then have that, 

7 OO OO ^il ^in 

ii^=o 'i„=o ;;^=o 'i„=o 
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)eij+... + (ii„-i' )ei„ ;/ ,/ 
("iiM) • • • • ■ (Q!i„M) *"^(i,^-i;je,, +...+(;,„ -ijje,„ 

-K(i)^(i,eii+...+(i„ei„) 

oo oo ^ 

^"^ E ■ • • • • E yr^ ^("iiM)'^^ • • • • ■ (ai„M)'^" X 

oo oo 

E E +-+('*n-'i„)ei„^(iii-iJi)eii +... +(;i„-;^„)ei„ - '«(*)2;{ii,...,i„} 



= g-Cl-^il-'-'-ain)/* 



E--- E 

- '«(*)^{ii,...,i„} 



kii — ki-, — 



= g-(l-«il-----«in)/* 



E E 

(A2) 

fe=0 {ji,---.jl=}C{ii,.--,in} 

We now claim that, 

n 

%,...,in}=E(-l)""' E ^Ol.-..-^} (A3) 

fc=o {ii,...jfc}c{<i,...,i„} 
This can be proved by induction. For n = this statement is clearly true, since = ig. Suppose then, that for some 
n > 0, the statement is true for all < m < n. Then we have, 

n+l 

E E 

^=0 {ii,...,ife}c{ii,...,i„+i} 

= ^{il,...,in + l} 



+ E E 

^==0 {jl>---Jfe}C{il,.--,jn + l} 



(A4) 



and so. 



,in + l} '^{il)---)in + l} 

-E E E(-i)'-'x 

k=0{ji,...,jk}^{ii,--;in+i} 1=0 

Now, for each set {ji, . . . ,jk} appearing in the sum, a given subset . . . , j;'} occurs only once. The fc-element 
sets {ji, ■ ■ ■ ,jk} which contain as a subset must be of the form {ji,---,j'i} U {j" , ■ ■ ■ ,jk-i}, where 

{ii'j • • • ^ {h, • • • ,^n+i}/{ji, • • • Therefore, there are ("^^7*) distinct fc-element sets which contain 

{ji,' , . . . Rearranging the sum, we obtain. 
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k-l 



n / 

E(-i)'-H' 



— ^{i\,...,in^\\ 

n 

-E E 

' = {ilv>j!}C{il,.--,»n + l} 

n+1 

= ^(-1)"+^-' Yl (A6) 

This completes the induction step, and proves the claim. 

We are almost ready to derive the expression for dz[i^^,„^i^y/dt. Before doing so, we state the following identity, 
which we will need in our calculation: 

n n 

n(l-ai) = E(-l)' E "..•••••"i. (A7) 

»=1 k=0 {ii,...,ifc}C{l,...,n} 



We now have. 



dt f^} ' . ■f^. . dt 



E(-ir-' E 

^=0 {3i,---dk}Q{n,---4n} 

n k 

'==0 {ji,...Jfe}c{n,...,i„} 1=0 {j[,...,j[}iz{h,...,jk} 

n 

E E 

'=0 {jiv>ji}C{ii,...,i„} 

n 

fe=' {jiv>Jfc_JC{ii,...,i„}/{ji,...,jj} 

n 

E E 

'=0 {jlv>ji}^{»lv>»n} 

n— / 

fe-'=o tt,...,ji:_jc{n,...,i„}/{ji,...,j,} 



E E 

'=0 {jlv>j!}C{il,...,in} 

ie{ii,...,i„}/{ii,...Ji} 

-«(i)%i,...,z„} 

n 

'=0 {iiviii}C{ii,...,in} i6{jl,.--i»n}/{ili---iJ!} 



which is exactly Eq. (13). 



APPENDIX B: DERIVATION OF THE LOCALIZATION LENGTHS 



In this section we derive the expression for d{li)^-_^ in}/*^*- have, 



dt - E ••••• E E ••••• E E !...../' !/'! 

h,=0 l,=Oli=0 I' =0 I'. =0l'.=0 '1 * 

(ai,M)'-i • ••• • (Qi„/i)'^'.(aiM)'-2(;,,-ijje,,+...+(i*„-;<je,„+(i^-;;)e, 

oo oc oo / -./^ f \V / \V 



V \ . .y. l/M 
=0 =o;'=o 'i' ■ ■ ■ 



oc oc 



E ••••• E E(^^+^^K^i 

feii=0 fei„=Ofei=0 

n 

^=0 {jiv>jfc}C{ii,...,i„} 

71+1 

fe=o {ii,...,jfc}c{ii,...,i„,i} 



-(l-«n-----«in-«i)/* ' 

fc=0 {ji,...Jfc}C{ii,...,i„} 



-K(i)(^i>{ii,...,in} 



We therefore have that, 



fe=0 {ji,...,jfc}C{ji,...,i„} 



fe=0 {ji,...,jfc}C{ii,...,i„} i=0 {jl,...,jj'}C{ji,...,jfc} 



i=o {j;,...j,'}c{ii,...,i„} 



28 



n—l 

n 

/s=0 {ji,.--.jfe}C{ii,.--,»n} 

n (l-e-^^) 

je{ii,...,i„}/{ji,...,jfc} 

which is exactly the expression in Eq. (25). 

APPENDIX C: NUMERICAL DETAILS 

The finite sequence length equations, given by Eq. (11), may be expressed in vector form, 

^^=m-{H.z)z (CI) 



At equilibrium, we therefore have that, 



z= (C2) 

K - Z 

The equilibrium solution may be found using fixed-point iteration, via the equation, 

Zn+l = — — —'BZn (C3) 
K • Zfi 

The iterations are stopped when the Zn stop changing. This is determined by introducing a cutoff parameter 5, and 
stop iterating when the fractional change of each of the components after iterations is smaller than 5. is chosen 
to be sufficiently large so that, on average, each base mutates at least once after iterations. Thus, we choose 
N, = 1/e. 

What this method does is account for the fact that equilibration takes longer for smaller values of e. This means 
that the smaller the value of e, the more times it is necessary to iterate before comparing the changes in the £„. For 
our two-gene simulation, we took S = 10"'*, and zq = (1,1). We chose this initial condition to show that, even though 
backmutations may become small at large sequence lengths, they still strongly affect the equilibrium solution. By 
iterating a sufficient number of times, the cumulative effect of the backmutations becomes sufficiently large to lead to 
a unique equilibrium solution, independent of the initial condition. 
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